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ABSTRACT 

Analyzing networks requires complex algorithms to extract 
meaningful information. Centrality metrics have shown to 
be correlated with the importance and loads of the nodes 
in network traffic. Here, we are interested in the problem 
of centrality-based network management. The problem has 
many applications such as verifying the robustness of the 
networks and controlling or improving the entity dissemi- 
nation. It can be defined as finding a small set of topolog- 
ical network modifications which yield a desired closeness 
centrality configuration. As a fundamental building block 
to tackle that problem, we propose incremental algorithms 
which efficiently update the closeness centrality values upon 
changes in network topology, i.e., edge insertions and dele- 
tions. Our algorithms are proven to be efficient on many 
real-life networks, especially on small- world networks, which 
have a small diameter and a spike-shaped shortest distance 
distribution. In addition to closeness centrality, they can 
also be a great arsenal for the shortest-path-based manage- 
ment and analysis of the networks. We experimentally vali- 
date the efficiency of our algorithms on large networks and 
show that they update the closeness centrality values of the 
temporal DBLP-coauthorship network of 1.2 million users 
460 times faster than it would take to compute them from 
scratch. To the best of our knowledge, this is the first work 
which can yield practical large-scale network management 
based on closeness centrality values. 

Categories and Subject Descriptors 

E.l [Data]: Graphs and Networks; G.2.2 [Discrete Math- 
ematics]: Graph Theory — Graph algorithms 

General Terms 

Algorithms, Performance, Experimentation 

Keywords 

Closeness centrality, centrality management, dynamic net- 
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works, small- world networks 

1. INTRODUCTION 

Centrality metrics, such as closeness or betweenness, quan- 
tify how central a node is in a network. They have been suc- 
cessfully used to carry analysis for various purposes such as 
structural analysis of knowledge networks [23, 26], power 
grid contingency analysis [14], quantifying importance in 
social networks [20], analysis of covert networks [16], deci- 
sion/action networks [5], and even for finding the best store 
locations in cities [25]. Several works which have been con- 
ducted to rapidly compute these metrics exist in the liter- 
ature. The algorithm with the best asymptotic complexity 
to compute centrality metrics [2] is believed to be asymp- 
totically optimal [15]. Research have focused on either ap- 
proximation algorithms for computing centrality metrics [3, 
8, 21] or on high performance computing techniques [18, 27]. 
Today, it is common to find large networks, and we are al- 
ways in a quest for better techniques which help us while 
performing centrality-based analysis on them. 

When the network topology is modified, ensuring the cor- 
rectness of the centralities is a challenging task. This prob- 
lem has been studied for dynamic and streaming networks [10, 
17]. Even for some applications involving a static network 
such as the contingency analysis of power grids and robust- 
ness evaluation of networks, to be prepared and take proac- 
tive measures, we need to know how the centrality values 
change when the network topology is modified by an adver- 
sary and outer effects such as natural disasters. 

A similar problem arises in network management for which 
not only knowing but also setting the centrality values in a 
controlled manner via topology modifications is of concern to 
speed-up or contain the entity dissemination. The problem 
is hard: there are m candidate edges to delete and 0(n 2 ) 
candidate edges to insert where n and m are the number 
of nodes and edges in the network, respectively. Here, the 
main motivation can be calibrating the importance/load of 
some or all of the vertices as desired, matching their loads to 
their capacities, boosting the content spread, or making the 
network immune to adversarial attacks. Similar problems, 
such as finding the most cost-effective way which reduces 
the entity dissemination ability of a network [24] or finding 
a small set of edges whose deletion maximizes the shortest- 
path length [13], have been investigated in the literature. 
The problem recently regained a lot of attention: A generic 
study which uses edge insertions and deletions is done by 
Tong et al. [28] . They use the changes on the leading eigen- 
value to control/speed- up the dissemination process. Other 



recent works investigate edge insertions to minimize the av- 
erage shortest path distance [22] or to boost the content 
spread [4]. From the centrality point of view, there exist 
studies which focus on maximizing the centrality of a node 
set [9, 12] or a single node [12] by edge insertions. In generic 
centrality-based network management problem, the desired 
centralities of all the nodes need to be obtained or approx- 
imated with a small set of topology modifications. As Fig- 
ure 1 shows, the effect of a local topology modification is 
usually global. Furthermore, existing algorithms for incre- 
mental centrality computation are not efficient enough to 
be used in practice. Thus, novel incremental algorithms are 
essential to quickly evaluate the effects of topology modifi- 
cations on centrality values. 




Figure 1: A toy network with nine nodes, three 
consecutive edge (ah, fh, and ab, respectively) inser- 
tions/deletions, and values of closeness centrality. 

Our contributions can be summarized as follows: 

1. To attack the variants of the centrality-based network 
management problem, we propose incremental algo- 
rithms which efficiently update the closeness centrali- 
ties upon edge insertions and deletions. 

2. The proposed algorithms can serve as a fundamen- 
tal building block for other shortest-path-based net- 
work analyses such as the temporal analysis on the 
past network data, maintaining centrality on stream- 
ing networks, or minimizing/ maximizing the average 
shortest-path distance via edge insertions and dele- 
tions. 

3. Compared with the existing algorithms, our algorithms 
have a low-memory footprint making them practical 
and applicable to very large graphs. For random edge 
insertions/deletions to the Wikipedia users' communi- 
cation graph, we reduced the centrality (re) computation 
time from 2 days to 16 minutes. And for the real-life 
temporal DBLP coauthorship network, we reduced the 
time from 1.3 days to 4.2 minutes. 

4. The proposed techniques can easily be adapted to al- 
gorithms for approximating centralities. As a result, 
one can employ a more accurate and faster sampling 
and obtain better approximations. 

The rest of the paper is organized as follows: Section 2 
introduces the notation and formally defines the closeness 
centrality metric. Section 3 defines network management 
problems we are interested. Our algorithms explained in 
detail in Section 4. Existing approaches are described in 
Section 5 and the experimental analysis is given in Section 6. 
Section 7 concludes the paper. 



2. BACKGROUND 

Let G — (V, E) be a network modeled as a simple graph 
with n = \V\ vertices and m = \E\ edges where each node is 
represented by a vertex in V, and a node-node interaction is 
represented by an edge in E. Let Vg(v) be the set of vertices 
which are connected to v in G. 

A graph G' = (V, E') is a subgraph of G if V C V and 

E' C E. A path is a sequence of vertices such that there 

exists an edge between consecutive vertices. A path between 

two vertices s and t is denoted by s ^ t (we sometimes use 
p 

s ^ t to denote a specific path P with endpoints s and 
£). Two vertices u, v £ V are connected if there is a path 
from u to v. If all vertex pairs are connected we say that G 
is connected. If G is not connected, then it is disconnected 
and each maximal connected subgraph of G is a connected 
component, or a component, of G. We use <fc(^, v) to denote 
the length of the shortest path between two vertices u, v in 
a graph G. If u = v then dc(u, v) = 0. And if u and v are 
disconnected, then (1g(u,v) = oo. 

Given a graph G = (V,E), a vertex v £ V is called an 
articulation vertex if the graph G — v (obtained by removing 
v) has more connected components than G. Similarly, an 
edge e G E is called a bridge if G — e (obtained by removing 
e from E) has more connected components than G. G is 
biconnected if it is connected and it does not contain an 
articulation vertex. A maximal biconnected subgraph of G 
is a biconnected component. 

2.1 Closeness Centrality 

Given a graph G, the farness of a vertex u is defined as 
±ax[u] = ^2 d G (u,v). 

vEV 

And the closeness centrality of u is defined as 

' 1 (i) 



f ar [u] 

If u cannot reach any vertex in the graph cc[u] = 0. 

For a sparse unweighted graph G (V, E) with \V\ — n 
and \E\ = m, the complexity of cc computation is 0(n(m + 
n)). For each vertex s, Algorithm 1 executes a Single-Source 
Shortest Paths (SSSP) algorithm. It initiates a breadth-first 
search (BFS) from s, computes the distances to the other 
vertices, compute f ar[s], the sum of the distances which are 
different than oo. And, as the last step, it computes cc[s]. 
Since a BFS takes 0(m + n) time, and n SSSPs are required 
in total, the complexity follows. 

3. PROBLEM DEFINITIONS 

The following problem can be considered as a generalized 
version of the problems investigated in [9, 12]. 

Definition 3.1. (Centrality-based network management) 
Let G — (V,E) be a graph. Given a centrality metric C, a 
target centrality vector c' , and an upper bound U on the 
number of inserted/ deleted edges, construct a graph G' — 
(V,E'), s.t, \EAE'\ < U and ||c' — c G /|| is minimized. 

In this work, we are interested in the closeness metric 
which is based on shortest paths. Hence, implicitly, we are 
also interested in the following problem partly investigated 
in [13, 22, 24]. 



Algorithm 1: CC: Basic centrality computation 

Data: G = (V, E) 
Output: cc[.] 
1 for each s G V do 

>SSSP(G, s) with centrality computation 

Q ^— empty queue 
d[v] <- oo, Vv E V \ {s} 
Q.push(s), d[s] <— 
f ar[s] <- 

while Q is not empty do 
v <- Q.pop() 
for all w £ F(v) do 
if d[w] = oo then 
Q.push(w) 
d[w] <- d[v] + 1 
f ar[s] <- f ar[s] + d[w] 

l 

far[s] 

return cc[.] 



Definition 3.2. (Shortest-path-based network manage- 
ment) Let G — (V,E) be a graph. Given an upper bound U 
on the number of inserted/ deleted edges, construct a graph 
G' — (V,E f ) where \EAE'\ < U and the (average) shortest- 
path in G' is minimized /maximized. 

These problems and their variants have several applica- 
tions such as slowing down pathogen outbreaks, increasing 
the efficiency of the advertisements, and analyzing the ro- 
bustness of a network. Consider an airline company with 
flights to thousands of airports and aim to add some new 
routes to increase the load of some underutilized airports. 
When a new route is inserted, in order to evaluate its overall 
impact, all the airport centralities need to be re-computed 
which is a quite expensive task. Hence, we need to have effi- 
cient incremental algorithms to tackle this problem. Such 
algorithms can be used as a fundamental building block 
to centrality- and shortest-path-based network management 

problems (and their variants) as well as temporal centrality /shortes?^ n m *f since <fc(s 
path analyses and dynamic network analyses. In this work, on cc [ s ] * s necessary, 
we investigate this subproblem. 

Definition 3.3. (Incremental closeness centrality) Given 
a graph G = (V,E), its centrality vector cc, and an edge 
uv, find the centrality vector cc of the graph G' — ( V, E U 
{uv}) (orG' = (V,E\{uv})). 

4. MAINTAINING CENTRALITY 

Many interesting real-life networks are scale free. The 
diameters of these networks grow proportional to the loga- 
rithm of the number of nodes. That is, even with hundreds 
of millions of vertices, the diameter is small, and when the 
graph is modified with minor updates, it tends to stay small. 
Combining this with their power-law degree distribution, 
we obtain the spike-shaped shortest-distance distribution as 
shown in Figure 3. We use two main approaches: work fil- 
tering and SSSP hybridization to exploit these observations 
and reduce the centrality computation time. 

4.1 Work Filtering 

For efficient maintenance of closeness centrality in case of 
an edge insertion/deletion, we propose a work filter which 
reduces the number of SSSPs in Algorithm 1 and the cost 
of each SSSP. Work filtering uses three techniques: filtering 
with level differences, with biconnected component decompo- 
sition, and with identical vertices. 



4.1.1 Filtering with level differences 

The motivation of level-based filtering is detecting the 
unnecessary updates and filtering them. Let G = ( V, E) 
be the current graph and uv be an edge to be inserted to 
G. Let G' = (V,EU uv) be the updated graph. The cen- 
trality definition in (1) implies that for a vertex s € V, if 
dc(s,t) = dQ'(s,t) for all t £ V then cc[s] = cc[s]. The 
following theorem is used to detect such vertices and filter 
their SSSPs. 

Theorem 4.1. Let G = (V, E) be a graph and u and v be 
two vertices in V s.t. uv £ E. Let G' — (V, E U uv). Then 
cc[s] = cc[s] if and only if |<fc(s, u) — <fc(s, v)\ < 1. 

PROOF. If s is disconnected from u and v, uv's inser- 
tion will not change the closeness centrality of s. Hence, 
cc[s] = cc[s]. If s is only connected to one of u and v in 
G the difference \do(s, u) — do(s, v)\ is oo, and the closeness 
centrality score of s needs to be updated by using the new, 
larger connected component containing s. 

When s is connected to both u and v in G, we investigate 
the edge insertion in three cases as shown in Figure 2: 

Case 1. cIg(s,u) — c?g(s, v): Assume that the path s ^ u- 

v t is a shortest s ^ t path in G' containing uv. Since 

p" p' 

cIg(s,u) = cIg(s,v) there exist another path s ^ v ^> t in 

G' with one less edge. Hence, uv cannot be in a shortest 

path: Vt e V,d G (s,t) = d G '(s,t). 

Case 2. \cIg(s,u) — cIg(s,v)\ = 1: Let dc(s,u) < dci^s^v) 
p p' 

and assume that s ^ u-v ^ t is a shortest path in G 

containing uv. Since dc(s,v) = dc(s,u) + 1, there exist 
pii p> 

another path s ^ v ^> t in G with the same number of 
edges. Hence, Vt G V, dG(s,t) = dG'(s,t). 

Case 3. |g?g(s, u) — dG(s, v)\ > 1: Let <fc(s, u) < dG(s,v). 
The path s u-v in G' is shorter than the shortest s ^ v 

+ 1. Hence, an update 



v) > dG(s,u) 

□ 




Case 1 




Case 2 




Case 3 



Figure 2: (1) Three cases of edge insertion: when an 
edge uv is inserted to the graph G, for each vertex s, 
one of them is true: (a) dG(s,u) = dG(s,v), (b) \dG(s,u) — 
d G (s,v)\ = 1, and (c) \d G (s,u) - d G (s,v)\ > 1. 

Although Theorem 4.1 yields to a filter only in case of 
edge insertions, the following corollary which is used for edge 
deletion easily follows. 

Corollary 4.2. Let G = (V, E) be a graph and u and v 
be two vertices in V s.t. uv G E. Let G' — (V,E \ {uv}). 
Then cc[s] = cc[s] if and only if \d G > (s, u) — d G > (s, v)\ < 1. 

With this corollary, the work filter can be implemented for 
both edge insertions and deletions. The pseudocode of the 
update algorithm in case of an edge insertion is given in Al- 
gorithm 2. When an edge uv is inserted/deleted, to employ 
the filter, we first compute the distances from u and v to 



all other vertices. And, it niters the vertices satisfying the 
statement of Theorem 4.1. 



Algorithm 2: Simple work filtering 

Data: G = (V,E), cc[.], uv 
Output: cc ; [.] 

G' <- (V,EU{uv}) 

du[.] ^— SSSP(G, u) > distances from u in G 
dv[.] ^— SSSP(G, v) > distances from v in G 
for each s G V do 

if \du[s] — dv[s]\ < 1 then 
| cc x [s] = ccfs] 

else 

| > use the computation in Algorithm 1 with G' 
return cc'[.] 



In theory, filtering by levels can reduce the update time 
significantly. However, in practice, its effectiveness depends 
on the underlying structure of G. Many real-life networks 
have been repeatedly shown to possess unique characteris- 
tics such as a small diameter and a power-law degree dis- 
tribution [19]. And the spread of information is extremely 
fast [6, 7]. The proposed filter exploits one of these char- 
acteristics for efficient closeness centrality updates: the dis- 
tribution of shortest-path lengths. Its efficiency is based on 
the phenomenon shown in Figure 3 for a set of graphs used 
in our experiments: the probability distribution function for 
a shortest-path length being equal to x is unimodular and 
spike-shaped for many social networks and also some others. 
This is the outcome of the short diameter and power-law 
degree distribution. On the other hand, for some spatial 
networks such as road networks, there are no sharp peaks 
and the shortest-path distances are distributed in a more 
uniform way. The work filter we propose here prefer the 
former. 
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Figure 3: Probability of the distance between two (con- 
nected) vertices is equal to x for four social and web 
networks. 

4.1.2 Filtering with biconnected components 

Our work filter can be enhanced by employing and main- 
taining a biconnected component decomposition (BCD) of 
G = (V, E). A BCD is a partitioning II of the edge set E 
where 11(e) indicates the component of each edge e G E. A 
toy graph and its BCDs before and after edge insertions are 
given in Figure 4. 

When uv is inserted to G = (V, E) and G' = (V, E'\J{uv}) 
is obtained, we check if 

{u(uw) : w e r G (u)} n {u(vw) : w g r G (v)} 

is empty or not. If the intersection is not empty, there will be 
only one element in it, cid, which is the id of the biconnected 
component of G' containing uv (otherwise II is not a valid 
BCD). In this case, II' (e) is set to 11(e) for all e G E and 
H' (uv) is set to cid. If there is no biconnected component 




(a) G (b) n (c) IT 

Figure 4: A graph G (left), its biconnected component 
decomposition II into 4 components (middle), and the 
updated li' with 3 components when the edge bd is in- 
serted (right). The sets of articulation vertices before 
and after the edge insertion are {6, c, d} and {6, d}, respec- 
tively. After the edge addition, cid = 2. That is to say, 
the second component contains the new edge. Hence, the 
biconnected component 2 is extracted first and executes 
an update algorithm only for the vertices {6, c, d}. It also 
initiates a fixing phase to update the closeness centrality 
values for the rest of the vertices. After the edge inser- 
tion, rep[a] = 6, and rep[e] = rep[f] = rep[g] = rep[h] = b. 
Hence, R[b] = 2, R[c] = 1, and R[d] = 5. And, RF[6] = 1, 
RF[c] = 0, and RF[d] = 6. 

containing both u and v (see Figure 4(c)), i.e., if the inter- 
section above is empty, we construct U f from scratch and 
set cid = H'(uv). II can be computed in linear, G(m + n) 
time [11]. Hence, the cost of BCD maintenance is negligible 
compared to the cost of updating closeness centrality. 

Let G' cid — (V C id,E' cid ) be the biconnected component of 
G' containing uv where 

V cid = {v eV :cide {W(vw) : vw e E'}}, 
E' c%d = {e e E' : n'(e) = cid}. 

Let Add Q Vdd be the set of articulation vertices in G' cid . 
Given II 7 , it is easy to detect the articulation vertices since 
u is an articulation vertex if and only if it is part of at least 
two components in the BCD: Kll^mt;) : uw £ E'}\ > 1. 

We will execute SSSPs only for the vertices in G' cid and 
use the new values to fix the centralities for the rest of the 
graph. The contributions of the vertices in V \ V C id are 
integrated to the SSSPs by using a representative function 
rep : V — > V C id U {null} which maps each vertex v G V 
either to a representative in G' cid or to null (if v and the 
vertices in Vdd are in different connected components of G'). 

For each vertex u G V C id, we set rep [it] = u. For the other 
vertices, let G f ~ d = {V, E' \ E' cid }. If a vertex v G V \ V cid 
and an articulation vertex u G Add are connected in G' cid , 
i.e., d-^r, — (u,v) 7^ oo, we say that v is represented by u in 

cid 

G'dd and set rep[v] — u. Otherwise, rep[v] is set to null. 
The following theorem states that rep is well defined: each 
vertex is represented by at most one vertex. 

Theorem 4.3. For each v in V \ Vdd, there is at most 
one articulation vertex u G Add such that d-^, — (u,v) ^ oo. 

cid 

Proof. The proof directly follows from the definition of 
BCD and is omitted. □ 

Since all the (shortest) paths from a vertex v G V \ V C id 
to a vertex in Vdd are passing through rep[v], the following 
is a corollary of the theorem. 



Corollary 4.4. For each vertex v G V\V C id withrep[v] / 
null, d-^i — (v,rep[v]) = d G i (v, rep[v]), which is different 
than oo. Furthermore, for a vertex w G V which is also 
represented in G' cid but not in the connected component of 
G' cid containing v, d G >(v,w) is equal to 

do 1 (v, rep[v\) + do' (rep[v],rep[w]) + d G > (rep[w],w). 

If w G Vdd the last term on the right is 0, since rep[w] = w. 

To correctly update the new centrality values, we compute 
two extra values for each vertex u £ V C id, 



R[u] = \{v G V : rep[i>] = u}\ , 
RF[u]= d G'{u,v). 



(2) 
(3) 



rep[v]=u 



That is, R[u] is the number of vertices in V which are rep- 
resented by u (including u). And RF[u] is the farness of u 
to these vertices in G' . The modified update algorithm is 
given in Algorithm 3. 

Algorithm 3: Update with BCD and level differences 

Data: G = (V,E) } II, cc, far, uv 
Output: cc'[.], far 7 !.] 
l > prepare for filtering 

G' <- (V, E') where E' <- E U {uv} 
cSetu <- {U(uw) : w G T G (u)} 
cSet v <- {U(vw) : w G T G (v)} 
if cSet u n cSet v / then 
I cid <— #the common component 
1 II'(e) <- n(e) Ve G £7, lT(m;) <- cid 
else construct IT 7 from G ; , cid ^— II 7 (uu) 
^d <~ {v eV :cide {W(vw) : vw G 



^d^{ ee£;/:n/ ( e ) = c ^} 
G 'dd = ( y cid,E' cid ) 



G cid = (V cid ,E' cid \{uv}) 
Set rep[f], Vv G V 

R[w] «— G V, rep[i;] = w}|, \fu G V ci d 
RF M <" T, v eV,rep[v]=u d G>(u,v), Vu G V ci d 

<H] <- SSSP(G c , d , u), dv[.} <- SSSP(G c , d , v) 
2 > update phase 

for each s G V c id do 

if \du[s] — dv[s]\ < 1 then cc'[s] = cc[s] 
else 

Q «<— empty queue 
d[v] i oo, \/v G y ci d \ {«} 
Q.push(s), d[s] <- 
far'[s] <- 

while Q is not empty do 
v <- Q.pop() 
for all k; G Vqi (v) do 

if d[w\ = oo then 
Q.push(w) 
d[w] <- d[v] + 1 

far'[s] <- far'[. 
l 



(o!H x R[w]) +RF[w] 



5 > fix phase 

for each v G V \ V c id do 
r <r- rep\v\ 

if r / null and far[r] / far'[r] then 
fax'[v] <- far[v] - (far[r] - far'[r]) 

CC f \v] <- . 77 i 

return cc'[.] 



Lemma 4.5. For eac/i vertex v G V^d, Algorithm 3 com- 
putes the correct cc'[v] value. 



Proof. We will prove that far ; [v] is correct for all v G 
Vdd- Let v = s be the vertex whose closeness centrality 
update is started at line 3. At line 4 of Algorithm 3, the 
update on far'[v] is d G '(v,w) x R[w] + RF[w] which can be 
rewritten as 



Y] d G '(v,w) +d G '(w,u), 

rep[u]=w 

by using (2) and (3). According to Corollary 4.4, this is 
equal to 

^2 d G *(v,u). 

uev 

rep[u] — w 

Due to the definition of rep, only the vertices which are 
connected to v will have an effect on far'[v]. And due to 
Theorem 4.3, each vertex can contribute to at most one up- 
date. Hence 

d G t(v,u)= y2 d G ,(v,u), 



^ev cid uev 

rep[u] —w 



dQf (v ,u)t^oo 



which is the far x [v] in G' as desired. □ 

Lemma 4.6. For each vertex v G V \ V c %d, Algorithm 3 
computes the correct cc'[v] value. 

Proof. We will prove that far ; [v] is correct for all v G 
V\V C id after the fix phase. Let u = rep[v]. If u is null then 
v's farness and hence closeness value will remain the same. 

Assume that u is not null. Let w be a vertex with rep[w] ^ 
null. If w and v are in the same connected component of 
G'dd t nen d G (v,w) — d G >(v,w) and d G (u,w) — d G >(u,w). 
Hence, the change on f ar[v] and f ar[u] due to w are both 0. 
On the other hand, if w is in a different connected component 
of G' cid according to Corollary 4.4, 

d G >(v,w) = d G '(v,u) + d G t(u,rep[w]) + d G t(rep[w],w), 

where the sum of the second and the third terms is equal to 
d G i (u, w). Since the first term does not change by the inser- 
tion of uv, the change on d G > (u, w) is equal to the change on 
d G /(v,w). That is when aggregated, the change on far[v] 
is equal to the change on far[u]. Lemma 4.5 implies that 
far [it] is correct. Hence, far x [v], computed at line 6, must 
also be correct. □ 

Theorem 4.7. For each vertex v G V , Algorithm 3 com- 
putes the correct cc[v] value. 

Proof. Follows from Lemma 4.5 and 4.6. □ 

The complexity of the update algorithm is 0(n(m + n)). 
And the overhead of filter preparation (line 1 through 2) is 
(D(m + n) since it only contains a constant number of graph 
traversals. In case of an edge deletion, it is enough to get 
G' cid as the biconnected component which was containing 
the deleted edge. The rest of the procedure can be adapted 
in a straightforward manner. 

4.1.3 Filtering with identical vertices 

Our preliminary analyses on various networks show that 
some of the graphs contain a significant amount of identical 
vertices which have the same/a similar neighborhood struc- 
ture. This can be exploited to reduce the number of SSSPs 
further. We investigate two types of identical vertices. 



Definition 4.8. In a graph G, two vertices u and v are 
type-I-identical if and only if Tg(u) = Fg(v). 

Definition 4.9. In a graph G, two vertices u and v are 
type-II-identical if and only if {u} U Tg(u) = {v} U Fg(v). 

Both types form an equivalance class relation since they 
are reflexive, symmetric, and transitive. Furthermore, all 
the non-trivial classes they form (i.e., the ones containing 
more than one vertex) are disjoint. 

Let u, v G V be two identical vertices. One can see that 
for any vertex w G V \ {u,v}, cIg(u,w) = cIg(v,w). Then 
the following is true. 

Corollary 4.10. Let X C V be a vertex-class contain- 
ing type-I or type-II identical vertices. Then the closeness 
centrality values of all the vertices in X are equal 

To construct these equivalance classes for the initial graph, 
we first use a hash function to map each vertex neighbor- 
hood to an integer: hashi[u] = X^er G 0) v " ^ e then sor ^ 
the vertices with respect to their hash values and construct 
the type-I vertex-classes by eliminating false positives due 
to collisions on the hash function. A similar process is ap- 
plied to detect type-II vertex classes. The complexity of this 
initial construction is G(n\ogn + m) assuming the number 
of collisions is small and hence, false-positive detection cost 
is negligible. 

Maintaining the equivalance classes in case of edge inser- 
tions and deletions is easy: For example, when uv is added 
to G, we first subtract u and v from their classes and in- 
sert them to new ones (or leave them as singleton if none of 
the vertices are now identical with them). The cost of this 
maintenance is G(n + m). 

While updating closeness centralities of the vertices in V, 
we execute an SSSP at line 3 of Algorithm 3 for at most 
one vertex from each class. For the rest of the vertices, 
we use the same closeness centrality value. The improve- 
ment is straightforward and the modifications are minor. 
For brevity, we do not give the pseudocode. 

4.2 SSSP Hybridization 

The spike-shaped distribution given in Figure 3 can also 
be exploited for SSSP hybridization. Consider the execution 
of Algorithm 1: while executing an SSSP with source s, for 
each vertex pair u, v, u is processed before v if and only if 
(Ig(s,u) < cIg(s,v). That is, Algorithm 1 consecutively uses 
the vertices with distance k to find the vertices with distance 
k + 1. Hence, it visits the vertices in a top-down manner. 
SSSP can also be performed in a a bottom-up manner. That 
is to say, after all distance (level) k vertices are found, the 
vertices whose levels are unknown can be processed to see if 
they have a neighbor at level k. 

Figure 5 gives the execution times of bottom-up and top- 
down SSSP variants for processing each level. The trend 
for top-down resembles the shortest distance distribution in 
small-world networks. This is expected since in each level £, 
the vertices that are t step far away from s are processed. 
On the other hand, for the bottom-up variant, the execution 
time is decreasing since the number of unprocessed nodes 
is decreasing. Following the idea of Beamer et al. [1], we 
hybridize the SSSPs throughout the centrality update phase 
in Algorithm 3. We simply compare the number of edges 
need to be processed for each variant and choose the cheaper 



one. For the case presented in Figure 5, the hybrid algorithm 
is 3.6 times faster than the top-down variant. 
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Figure 5: Execution times of bottom-up, top-down, and 
hybrid SSSPs at each level for the wiki-Talk graph. The 
hybrid version is 3.63 and 4.59 times faster than the top- 
down and bottom-up versions respectively. 

5. RELATED WORK 

To the best of our knowledge, there are only two works 
that deal with maintaining centrality in dynamic networks. 
Yet, both are interested in betweenness centrality. Lee et al. 
proposed the QUBE framework which updates between- 
ness centrality in case of edge insertion and deletion within 
the network [17]. QUBE relies on the biconnected com- 
ponent decomposition of the graphs. Upon an edge inser- 
tion or deletion, assuming that the decomposition does not 
change, only the centrality values within the updated bi- 
connected component are recomputed from scratch. If the 
edge insertion/deletion affects the decomposition the mod- 
ified graph is decomposed into its biconnected components 
and the centrality values in the affected part are recom- 
puted. The distribution of the vertices to the biconnected 
components is an important criteria for the performance of 
QUBE. If a large component exists, which is the case for 
many real-life networks, one should not expect a significant 
reduction on update time. Unfortunately, the performance 
of QUBE is only reported on small graphs (less than 100K 
edges) with very low edge density. In other words, it only 
performs significantly well on small graphs with a tree-like 
structure having many small biconnected components. 

Green et al. proposed a technique to update centrality 
scores rather than recomputing them from scratch upon edge 
insertions (can be extended to edge deletions) [10]. The idea 
is storing the whole data structure used by the previous be- 
tweenness centrality update kernel. This storage is indeed 
useful for two main reasons: it avoids a significant amount of 
recomputation since some of the centrality values will stay 
the same. And second, it enables a partial traversal of the 
graph even when an update is necessary. However, as the au- 
thors state, G(n 2 -\-nm) values must be kept on the disk. For 
the Wikipedia user communication and DBLP coauthorship 
networks, which contain thousands of vertices and millions 
of edges, the technique by Green et al. requires TeraBytes of 
memory. The largest graph used in [10] has approximately 
20K vertices and 200K edges; the quadratic storage cost 
prevents their storage-based techniques to scale any higher. 
On the other hand, the memory footprint of our algorithms 
are linear and hence they are much more practical. 

6. EXPERIMENTAL RESULTS 

We implemented our algorithms in C. The code is com- 
piled with gcc v4 . 6 . 2 and optimization flags -02 -DNDEBUG. 
The graphs are kept in memory in the compressed row stor- 
age (CRS) format. The experiments are run on a computer 
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Table 1: The graphs used in the experiments. Column 
Org. shows the initial closeness computation time of CC 
and Best is the best update time we obtain in case of 
streaming data. 

with two Intel Xeon E5520 CPU clocked at 2.27GHz and 
equipped with 48GB of main memory. All the experiments 
are run sequentially. 

For the experiments, we used 10 networks from the UFL 
Sparse Matrix Collection 1 and we also extracted the coau- 
thor network from current set of DBLP papers. Properties 
of the graphs are summarized in Table 1. We symmetrized 
the directed graphs. The graphs are listed by increasing 
number of edges and a distinction is made between small 
graphs (with less than 500K edges) and the large graphs 
(with more than 500K) edges. 

6.1 Handling topology modifications 

To assess the effectiveness of our algorithms, we need to 
know that when each edge is inserted to/deleted from the 
graph. Our datasets from UFL Sparse Matrix Collection do 
not have this information. To conduct our experiments on 
these datasets, we delete 1,000 edges from a graph chosen 
randomly in the following way: A vertex u G V is selected 
randomly (uniformly), and a vertex v G Tg(u) is selected 
randomly (uniformly). Since we do not want to change the 
connectivity in the graph (having disconnected components 
can make our algorithms much faster and it will not be fair 
to CC), we discard uv if it is a bridge. If this is not the 
case we delete it from G and continue. We construct the 
initial graph by deleting these 1,000 edges. Each edge is 
then inserted one by one, and our algorithms are used to re- 
compute the closeness centrality after each insertion. Beside 
these random insertion experiments, we also evaluated our 
algorithms on a real temporal dataset of the DBLP coauthor 
graph 2 . In this graph, there is an edge between two authors 
if they published a paper. Publication dates are used as 
timestamps of edges. We first constructed the graph for the 
papers published before January 1, 2013. Then, we inserted 
the coauthorship edges of the papers since then. Although 
our experiments perform edge insertion, edge deletion is a 
very similar process which should give comparable results. 

In addition to CC, we configure our algorithms in four 
different ways: CC-B only uses biconnected component de- 
composition (BCD), CC-BL uses BCD and filtering with 
levels, CC-BLI uses all three work filtering techniques in- 
cluding identical vertices. And CC-BLIH uses all the tech- 
niques described in this paper including SSSP hybridization. 

^ttp: //www. cise .uf 1 . edu/research/sparse/matrices/ 
2 http : //www . inf ormatik . uni-trier . de/~ley/ db/ 



Table 2 presents the results of the experiments. The sec- 
ond column, CC, shows the time to run the full Brandes 
algorithm for computing closeness centrality on the original 
version of the graph. Columns 3-6 of the table present ab- 
solute runtimes (in seconds) of the centrality computation 
algorithms. The next four columns, 7-10, give the speedups 
achieved by each configuration. For instance, on the aver- 
age, updating the closeness values by using CC-B on PGP- 
giantcompo is 11.5 times faster than running CC. Finally the 
last column gives the overhead of our algorithms per edge 
insertion, i.e., the time necessary to detect the vertices to 
be updated, and maintain BCD and identical-vertex classes. 
Geometric means of these times and speedups are also given 
to provide comparison across instances. 

The times to compute closeness centrality using CC on 
the small graphs range between 1 to 77 seconds. On large 
graphs, the times range from 13 minutes to 49 hours. Clearly, 
CC is not suitable for real-time network analysis and man- 
agement based on shortest paths and closeness centrality. 
When all the techniques are used (CC-BLIH), the time nec- 
essary to update the closeness centrality values of the small 
graphs drops below 3 seconds per edge insertion. The im- 
provements range from a factor of 27.2 (cond-mat-2005) to 
111.2 (P GP giantcompo) , with an average improvement of 
43.5 across small instances. On large graphs, the update 
time per insertion drops below 16 minutes for all graphs. 
The improvements range from a factor of 42.6 (loc-gowalla) 
to 458.8 (DBLP- coauthor) , with an average of 99.7. For all 
graphs, the time spent filtering the work is below one sec- 
ond which indicates that the majority of the time is spent 
for SSSPs. Note that this part is pleasingly parallel since 
each SSSP is independent from each other. 

The overall improvement obtained by the proposed algo- 
rithms is very significant. The speedup obtained by using 
BCDs (CC-B) are 3.5 and 3.2 on the average for small and 
large graphs, respectively. The graphs PGP giantcompo, and 
wiki- Talk benefits the most from BCDs (with speedups 11.5 
and 6.8, respectively). Clearly using the biconnected compo- 
nent decomposition improves the update performance. How- 
ever, filtering by level differences is the most efficient tech- 
nique: CC-BL brings major improvements over CC-B. For 
all social networks, CC-BL increased the performance when 
compared with CC-B, the speedups range from 4.8 (web- 
NotreDame) to 64 (DBLP- coauthor). Overall, CC-BL brings 
a 7.61 improvement on small graphs and a 13.44 improve- 
ment on large graphs over CC. 

For each added edge uv, let X be the random variable 
equal to \oIg(u,w) — oIg(v,w)\. By using 1,000 uv edges, 
we computed the probabilities of the three cases we inves- 
tigated before and give them in Fig. 6. For each graph 
in the figure, the sum of first two columns gives the ratio 
of the vertices not updated by CC-BL. For the networks 
in the figure, not even 20% of the vertices require an up- 
date (Pr(X > 1)). This explains the speedup achieved by 
filtering using level differences. Therefore, level filtering is 
more useful for the graphs having characteristics similar to 
small- world networks. 

Filtering with identical vertices is not as useful as the 
other two techniques in the work filter. Overall, there is a 
1.15 times improvement with CC-BLI on both small and 
large graphs compared to CC-BL. For some graphs, such 
as web-NotreDame and web- Google, improvements are much 
higher (30% and 31%, respectively). 
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Table 2: Execution times in seconds of all the algorithms and speedups when compared with the basic closeness 
centrality algorithm CC. In the table CC-B is the variant which uses only BCDs, CC-BL uses BCDs and filtering 
with levels, CC-BLI uses all three work filtering techniques including identical vertices. And CC-BLIH uses all the 
techniques described in this paper including SSSP hybridization. 
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Figure 6: The bars show the distribution of random 
variable X = \dc(u,w) — dc(v,w)\ into three cases we in- 
vestigated when an edge uv is added. 
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Figure 7: Sorted list of the runtimes per edge insertion 
for the first 100 added edges of web-NotreDame. 



Finally, the hybrid implementation of SSSP also proved to 
be useful. CC-BLIH is faster than CC-BLI by a factor of 
1.42 on small graphs and by a factor of 1.96 on large graphs. 
Although it seems to improve the performance for all graphs, 
in some few cases, the performance is not improved signif- 
icantly. This can be attributed to incorrect decisions on 
SSSP variant to be used. Indeed, we did not benchmark the 
architecture to discover the proper parameter. CC-BLIH 
performs the best on social network graphs with an improve- 
ment ratio of 3.18 (soc-sign-epinions), 2.54 (loc- gow alia), 
and 2.30 (wiki- Talk). 

All the previous results present the average update time 
for 1,000 successively added edges. Hence, they do not 
say anything about the variance. Figure 7 shows the run- 
times of CC-B and CC-BLIH per edge insertion for web- 
NotreDame in a sorted order. The runtime distribution of 
CC-B clearly has multiple modes. Either the runtime is 
lower than 100 milliseconds or it is around 700 seconds. We 
see here the benefit of BCD. According to the runtime dis- 
tribution, about 59% of web-NotreDame's vertices are inside 
small biconnected components. Hence, the time per edge in- 
sertion drops from 2,845 seconds to 700. Indeed, the largest 
component only contains 41% of the vertices and 76% of the 
edges of the original graph. The decrease in the size of the 
components accounts for the gain of performance. 

The impact of level filtering can also be seen on Figure 7. 
60% of the edges in the main biconnected component do not 
change the closeness values of many vertices and the updates 
that are induced by their addition take less than 1 second. 
The remaining edges trigger more expensive updates upon 
insertion. Within these 30% expensive edge insertions, iden- 
tical vertices and SSSP hybridization provide a significant 
improvement (not shown in the figure). 



Better Speedups on Real Temporal Data. 

The best speedups are obtained on the DBLP coauthor 
network, which uses real temporal data. Using CC-B, we 
reach 6.2 speedup w.r.t. CC, which is bigger than the aver- 
age speedup on all networks. Main reason for this behavior is 
that 10% of the inserted edges are actually the new vertices 
joining to the network, i.e., authors with their first publi- 
cation, and CC-B handles these edges quite fast. Applying 
CC-BL gives a 64.8 speedup over CC-B, which is drasti- 
cally higher than on all other graphs. Indeed, only 0.7% of 
the vertices require to run a SSSP algorithm when an edge 
is inserted on the DBLP network. For the synthetic cases, 
this number is 12%. CC-BLI provides similar speedups with 
random insertions and CC-BLIH does not provide speedups 
because of the structure of the graph. Overall, speedups ob- 
tained with real temporal data reaches 460.8, i.e., 4.6 times 
greater than the average speedup on all graphs. Our algo- 
rithms appears to perform much better on real applications 
than on synthetic ones. 

6.2 Summary 

All the techniques presented in this paper allow to up- 
date closeness centrality faster than the non-incremental al- 
gorithm presented in [2] by a factor of 43.5 on small graphs 
and 99.7 on large ones. Small- world networks such as social 
networks benefit very well from the proposed techniques. 
They tend to have a biconnected component structure that 
allow to gain some improvement using CC-B. However, they 
usually have a large biconnected component and still, most 
of the gain is derived from exploiting their spike-shaped dis- 
tance distribution which brings at least a factor of 13.4. 
Identical vertices typically brings a small amount of improve- 
ment but helps to increase the performance during expensive 
updates. Using all the techniques, we achieved to reduce the 



closeness centrality update time from 2 days to 16 minutes 
for the graph with the most vertices in our dataset (wiki- 
Talk). And for the temporal DBLP coauthorship graph, 
which has the most edges, we reduced the centrality update 
time from 1.3 days to 4.2 minutes. 

7. CONCLUSION 

In this paper we propose the first algorithms to achieve 
fast updates of exact centrality values on incremental net- 
work modification at such a large scale. Our techniques 
exploit the biconnected component decomposition of these 
networks, their spike-shaped shortest-distance distributions, 
and the existence of nodes with identical neighborhood. In 
large networks with more than 500i^ edges, our techniques 
proved to bring a 99 times speedup in average. With a 
speedup of 458, the proposed techniques may even allow 
DBLP to reflect the impact on centrality of the papers pub- 
lished in quasi real-time. Our algorithms will serve as a 
fundamental building block for the centrality-based network 
management problem, closeness centrality computations on 
dynamic/streaming networks, and their temporal analysis. 

The techniques presented in this paper can directly be 
extended in two ways. First, using a statistical sampling 
to compute an approximation of closeness centrality only 
requires a minor adaptation on the SSSP kernel to compute 
the contribution of the source vertex to other vertices instead 
of its own centrality. Second, the techniques presented here 
also apply to betweenness centrality with minor adaptations. 

As a future work, we plan to investigate local search tech- 
niques for the centrality-based network management prob- 
lem using our incremental centrality computation algorithms. 
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